Skip to content

Comments

support qwen3-omni-moe#2342

Open
zhang-cheng09 wants to merge 3 commits intoNVIDIA-NeMo:mainfrom
zhang-cheng09:qwen3_omni_moe
Open

support qwen3-omni-moe#2342
zhang-cheng09 wants to merge 3 commits intoNVIDIA-NeMo:mainfrom
zhang-cheng09:qwen3_omni_moe

Conversation

@zhang-cheng09
Copy link

@zhang-cheng09 zhang-cheng09 commented Feb 12, 2026

What does this PR do ?

Add a one line overview of what this PR aims to accomplish.

Changelog

  • Add specific line by line info of high level changes in this PR.

GitHub Actions CI

See the CI sectionin the Contributing doc for how to trigger the CI. A Nvidia developer will need to approve and trigger the CI for external contributors.

Before your PR is "Ready for review"

Pre checks:

  • Make sure you read and followed Contributor guidelines
  • Did you write any new necessary tests?
  • Did you add or update any necessary documentation?
  • Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)
    • Reviewer: Does the PR have correct import guards for all optional libraries?

If you haven't finished some of the above items you can still open "Draft" PR.

Additional Information

  • Related to # (issue)

Summary by CodeRabbit

Release Notes

  • New Features
    • Added support for Qwen3-Omni multimodal model with text, vision, audio, and video processing capabilities.
    • Introduced fine-tuning configuration for Qwen3-Omni-30B model variant.
    • Enhanced distributed training support with context parallelism utilities for efficient multi-rank data handling.

Signed-off-by: zhangcheng <chzhang_bj@163.com>
@copy-pr-bot
Copy link

copy-pr-bot bot commented Feb 12, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 12, 2026

📝 Walkthrough

Walkthrough

This PR introduces comprehensive support for Qwen3 Omni MoE multimodal models in Megatron-Bridge. It adds model architecture implementation, configuration classes, HuggingFace-to-Megatron bridging, vision/audio processing utilities, context parallelism helpers, and fine-tuning recipes for the Qwen3-Omni-30B-A3B variant.

Changes

Cohort / File(s) Summary
Public API Registration
src/megatron/bridge/models/__init__.py, src/megatron/bridge/models/qwen_omni/__init__.py
Wires up Qwen3OmniMoe classes (Model, Bridge, Provider) for public export and module initialization.
Core Model Implementation
src/megatron/bridge/models/qwen_omni/modelling_qwen3_omni.py, src/megatron/bridge/models/qwen_omni/thinker_model.py
Implements Qwen3OmniMoeModel wrapper facade and Qwen3OmniMoeThinkerModel orchestrating vision, audio, and language components with complex forward pass logic including multimodal fusion, preprocessing, and parallelism handling.
Bridge & Provider
src/megatron/bridge/models/qwen_omni/qwen3_omni_bridge.py, src/megatron/bridge/models/qwen_omni/qwen3_omni_provider.py
Defines Qwen3OmniMoeBridge for HF-to-Megatron parameter mapping and Qwen3OmniMoeModelProvider dataclass-based factory with extensive MoE hyperparameters, token IDs, and lifecycle methods (finalize, provide, provide_language_model).
Configuration
src/megatron/bridge/models/qwen_omni/transformer_config.py
Adds Qwen3OmniTransformerConfig extending TransformerConfig with multimodal hyperparameters including vision/audio token IDs, rotary embedding options, MRoPE sections, and optional HF vision configuration.
Utilities
src/megatron/bridge/models/qwen_omni/context_parallel_utils.py, src/megatron/bridge/models/qwen_omni/utils.py
Implements context parallelism utilities (expand/collapse_thw, parallel split, all_gather vision embeddings) and rope/position handling (get_rope_index, get_llm_pos_ids_for_vision) for multimodal sequence processing across distributed ranks.
Fine-tuning Recipes
src/megatron/bridge/recipes/qwen_vl/qwen3_vl.py
Adds qwen3_omni_30b_a3b_finetune_config helper for Qwen3-Omni-30B-A3B-Instruct model with expert-model-parallel-size 8 and freezing of language/vision modules.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant Model as Qwen3OmniMoeModel
    participant Thinker as Thinker Model
    participant Vision as Vision Encoder
    participant Audio as Audio Encoder
    participant Language as Language Model
    
    Client->>Model: forward(input_ids, pixel_values, input_features, ...)
    activate Model
    Model->>Thinker: forward(all inputs)
    activate Thinker
    
    Note over Thinker: Preprocessing Stage
    Thinker->>Vision: Encode images/videos
    activate Vision
    Vision-->>Thinker: vision_embeddings
    deactivate Vision
    
    Thinker->>Audio: get_audio_features(input_features)
    activate Audio
    Audio-->>Thinker: audio_embeddings
    deactivate Audio
    
    Note over Thinker: Multimodal Fusion
    Thinker->>Thinker: Fuse vision + audio + text embeddings
    Thinker->>Thinker: get_rope_index for position IDs
    
    Note over Thinker: Decode Stage
    Thinker->>Language: forward(fused_embeddings, input_ids, position_ids, ...)
    activate Language
    Language-->>Thinker: output_logits
    deactivate Language
    
    Thinker-->>Model: output_logits
    deactivate Thinker
    
    Model-->>Client: output_logits
    deactivate Model
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Suggested labels

r0.3.0

Suggested reviewers

  • erhoo82
  • malay-nagda
  • yaoyu-33
🚥 Pre-merge checks | ✅ 2 | ❌ 2
❌ Failed checks (2 warnings)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 43.33% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Test Results For Major Changes ⚠️ Warning PR introduces major changes (1,400+ lines, 7 new modules for Qwen3 Omni MoE support) but PR description contains only template with no test results, numerical validation, or performance benchmarks. Update PR description to include test results validating model functionality, numerical validation ensuring no convergence regressions, performance benchmarks if applicable, and a proper changelog entry.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'support qwen3-omni-moe' clearly and accurately summarizes the main change: adding support for the Qwen3 Omni MoE model across multiple new files, configurations, and utilities.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Tip

Issue Planner is now in beta. Read the docs and try it out! Share your feedback on Discord.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 18

🤖 Fix all issues with AI agents
In `@src/megatron/bridge/models/qwen_omni/context_parallel_utils.py`:
- Around line 1-4: Add the missing Apache-2.0 / NVIDIA copyright header at the
top of the module context_parallel_utils.py (before the first import) so the
file follows project policy; update the header text consistent with other files
in the repo (include the Apache 2.0 license notice and NVIDIA copyright line),
and ensure there is a blank line after the header before the existing imports
(torch and megatron.core.parallel_state as mpu).
- Around line 193-201: The function split_data_cp_rank currently asserts on
val.shape before checking if val is None and declares cp_rank with an implicit
Optional; fix by making the None-check first (move the "if val is None: return
val" to the top of split_data_cp_rank before any access to val.shape) or, if val
should never be None, remove that guard and enforce non-None callers; also
update the signature from cp_rank: int = None to cp_rank: int | None and retain
the fallback cp_rank = mpu.get_context_parallel_rank() logic so behavior is
unchanged.

In `@src/megatron/bridge/models/qwen_omni/modelling_qwen3_omni.py`:
- Around line 41-42: The constructor accepts talker_transformer_config and
code2wav_transformer_config but never uses or stores them; either remove these
parameters from the __init__ signature or persist/forward them (e.g., assign to
self.talker_transformer_config and self.code2wav_transformer_config or pass them
into the appropriate submodule constructors) so callers' values take effect; if
they are placeholders, add a clear TODO comment in the class constructor noting
intended future use (reference the parameter names talker_transformer_config and
code2wav_transformer_config in the class constructor to locate the change).
- Around line 95-141: Change the positional forwarding in forward to explicit
keyword forwarding so arguments are passed by name to self.thinker (e.g.,
self.thinker(input_ids=input_ids, input_features=input_features,
position_ids=position_ids, ... , **kwargs)) to avoid brittle positional
coupling; also fix type annotations: change extra_block_kwargs: dict = None ->
dict | None = None, cp_img_num: list[int] = None -> list[int] | None = None, and
add missing optional types for parameters referenced in this method such as
input_features: torch.Tensor | None = None, inference_params: InferenceParams |
None = None, packed_seq_params: PackedSeqParams | None = None,
feature_attention_mask: torch.Tensor | None = None, audio_feature_lengths:
torch.Tensor | None = None, use_audio_in_video: bool | None = None, and
video_second_per_grid: float | None = None so static analysis (RUF013) is
satisfied while keeping parameter names matching those passed to self.thinker.

In `@src/megatron/bridge/models/qwen_omni/qwen3_omni_bridge.py`:
- Around line 15-35: Remove the unused imports at the top of
qwen3_omni_bridge.py: drop logging, Dict/Mapping/Union from typing, torch.nn,
megatron.core.parallel_state, WeightConversionTask, and
extract_expert_number_from_param; keep only the actual used symbols such as
Qwen3OmniMoeForConditionalGeneration, MegatronMappingRegistry,
Qwen3OmniMoeModel, MegatronModelBridge, the param mapping classes (AutoMapping,
ConcatenatedQKVMapping, GatedMLPMapping, QKVMapping, ReplicatedMapping),
PreTrainedVLM, and Qwen3OmniMoeModelProvider so imports reflect real usage in
this module.

In `@src/megatron/bridge/models/qwen_omni/qwen3_omni_provider.py`:
- Around line 144-146: The docstring for the provide method incorrectly says
"Qwen3VL MoE"; update the string to "Qwen3 Omni MoE" to accurately describe the
model. Locate the provide method in qwen3_omni_provider.py (function provide)
and replace the phrase "Qwen3VL MoE" with "Qwen3 Omni MoE" in its docstring,
ensuring the rest of the description remains unchanged.
- Line 24: Replace the old typing imports and annotations: remove "from typing
import List, Optional" and update any annotations using List[...] to use
built-in list[...] and Optional[X] to use X | None (e.g., change List[str] ->
list[str], Optional[int] -> int | None). Apply these edits to the import line
and all type hints in the Qwen3OmniProvider implementation (fields and function
signatures referenced around the earlier import and at the sections identified
near lines 41–44, 84, and 108) so the module uses Python 3.10+ built-in generics
and the T | None nullable syntax.

In `@src/megatron/bridge/models/qwen_omni/thinker_model.py`:
- Around line 7-11: The file header in thinker_model.py has an incomplete Apache
2.0 license block; update the top-of-file license to match the complete header
used elsewhere (e.g., utils.py) by adding the missing sentences "distributed
under the License is distributed on an 'AS IS' BASIS, WITHOUT WARRANTIES OR
CONDITIONS OF ANY KIND, either express or implied." so the header includes the
full Apache clause and mirrors the other files' license text.
- Around line 323-342: The audio embeddings variable audio_embeds is never set
after computing audio_features, so audio modality is dropped; assign
audio_embeds from the computed audio_features (e.g., set audio_embeds =
audio_features or convert audio_features into the same shape expected by
combined_embeddings) after calling get_audio_features in the block where
input_features is not None, ensure audio_mask is computed using input_ids ==
self.audio_token_id, and then the existing conditional that writes
combined_embeddings[audio_mask] = audio_embeds will correctly insert the audio
embeddings; reference functions/variables: get_audio_features, audio_features,
audio_embeds, audio_mask, combined_embeddings, audio_feature_lengths.
- Line 13: Remove the Optional import and switch all nullable type annotations
to PEP 604 unions (X | None); specifically update the function signatures that
currently use implicit Optional defaults (e.g., parameters declared as "dict =
None" and "list[int] = None") to use explicit union annotations like "dict |
None" and "list[int] | None", and update get_audio_features signature to use the
same pattern; ensure you remove the now-unused Optional import from the top of
thinker_model.py and update any affected type hints accordingly.
- Around line 205-211: The code can raise AttributeError when both masks are
None: in get_audio_features the branch sets audio_feature_lengths = None and
then computes feature_lens = audio_feature_lengths if ... else
feature_attention_mask.sum(-1), which calls .sum on None; fix by making the else
path compute feature_lens from input_features instead of feature_attention_mask
(e.g. set feature_lens = torch.full((input_features.size(0),),
input_features.size(-1), dtype=torch.long, device=input_features.device) or
similar) and only use feature_attention_mask.sum(-1) when feature_attention_mask
is not None; update the logic around audio_feature_lengths/feature_lens to use
input_features, feature_attention_mask, and audio_feature_lengths safely.
- Around line 294-298: The fallback zero-tensor uses a hardcoded dtype
(torch.bfloat16); change it to use the language model's actual dtype so it
matches runtime precision. In the method creating vision_embeds (look for the
vision_embeds = torch.zeros(...) fallback in thinker_model.py) obtain the model
dtype via a safe lookup such as: model_dtype = getattr(self.language_model,
"dtype", None) or getattr(self.language_model.config, "torch_dtype", None) or
fallback to next(self.language_model.parameters()).dtype, then create the zeros
with dtype=model_dtype; apply the same change to the other fallback zero-tensor
occurrence so both use the model's dtype rather than torch.bfloat16.
- Around line 141-145: The assert currently passes a tuple as the message which
prints oddly; update the assert in thinker_model.py that references
vision_transformer_config.vision_config.deepstack_visual_indexes and
self.language_model.decoder.layers so the second argument is a single formatted
string (e.g., an f-string) combining the descriptive texts and values instead of
a tuple; keep the same condition and include both counts in the single message
for clear output.

In `@src/megatron/bridge/models/qwen_omni/transformer_config.py`:
- Around line 15-16: The file imports and type annotations use typing.List and
typing.Optional; update the import and annotations in transformer_config.py to
use built-in generics and PEP 604 union syntax: replace any "List[T]" with
"list[T]" and "Optional[T]" with "T | None" (remove List and Optional from the
typing import), and update all corresponding fields/usages in the dataclass
(e.g., the TransformerConfig dataclass and the other three annotated attributes
referenced in the review) to use the new forms.
- Around line 18-23: Replace the direct import of TransformerConfig from
megatron.core.transformer.transformer_config with the bridge wrapper import used
across providers: import TransformerConfig from
megatron.bridge.models.transformer_config so Qwen3OmniTransformerConfig inherits
the bridge's wrapper class (update the import statement that currently
references TransformerConfig and keep Qwen3OmniTransformerConfig as-is).

In `@src/megatron/bridge/models/qwen_omni/utils.py`:
- Line 16: Replace the use of typing.Optional with Python 3.10 union syntax:
remove the Optional import and update any function/type hints that use
Optional[...] to use X | None instead; specifically change the get_rope_index
signature and any other occurrences in the same file (around the block
referenced at lines 60-66) to use the `Type | None` form (e.g., replace
Optional[int] with int | None) and ensure imports reflect removal of Optional.
- Around line 86-88: The loop reuses the outer-scope name input_ids (for i,
input_ids in enumerate(total_input_ids)), which gets masked and then later used
incorrectly; rename the loop variable to _input_ids (for i, _input_ids in
enumerate(total_input_ids)), apply the attention mask to a local masked_ids =
_input_ids[_mask == 1] (or similar) and replace all downstream references inside
the loop from input_ids to the new name (e.g., masked_ids or _input_ids) so you
no longer shadow the outer variable; also change the places that rely on the
original tensor/device/length (notably the code that uses len(input_ids) and
input_ids.device around the delta calculation) to use total_input_ids (or
total_input_ids.device / len(total_input_ids)) instead.

In `@src/megatron/bridge/recipes/qwen_vl/qwen3_vl.py`:
- Line 251: The hf_path in the config is a local relative path (hf_path) instead
of the HuggingFace hub identifier expected by AutoBridge.from_hf_pretrained;
replace "../hf-hub/Qwen/Qwen3-Omni-30B-A3B-Instruct" with the official HF repo
id string (e.g., "Qwen/Qwen3-Omni-30B-A3B-Instruct" or the correct hub name used
elsewhere) so AutoBridge.from_hf_pretrained can resolve the model from the hub;
update the hf_path value in the qwen3_vl.py config to use the hub identifier.
🧹 Nitpick comments (12)
src/megatron/bridge/recipes/qwen_vl/qwen3_vl.py (1)

240-245: Docstring is missing default configuration details and _qwen3_vl_common reference.

All other finetune config helpers document the default TP/PP/EP/LR settings for LoRA/DoRA vs. Full SFT and point users to _qwen3_vl_common for the full parameter list. This docstring omits both.

Suggested docstring
 def qwen3_omni_30b_a3b_finetune_config(**user_kwargs: Unpack[Qwen3VLCommonKwargs]) -> ConfigContainer:
     """Return a fine-tuning config for Qwen/Qwen3-Omni-30B-A3B-Instruct.
 
     This is a Mixture-of-Experts model with 128 experts and top-8 routing.
     Recommended to use with expert parallelism (EP) for efficient training.
+
+    Default configuration: 1 node, 8 GPUs
+    - LoRA/DoRA: TP=1, PP=1, EP=8, LR=2e-4
+    - Full SFT: TP=1, PP=1, EP=8, LR=2e-5
+
+    See `_qwen3_vl_common` for the full list of parameters.
     """
src/megatron/bridge/models/qwen_omni/utils.py (3)

21-29: Missing type hints and incomplete docstring for _get_feat_extract_output_lengths.

The function lacks parameter and return type annotations, and the docstring doesn't explain the magic constants (100, 13, the two pooling steps). Even for an internal helper, a brief note about the source of these constants (e.g., the specific audio encoder architecture) would aid maintainability.


32-49: Missing return type annotation for get_llm_pos_ids_for_vision.

 def get_llm_pos_ids_for_vision(
     ...
-):
+) -> torch.Tensor:

As per coding guidelines: "Use type hints for function arguments and return types."


87-87: Redundant None check — attention_mask is guaranteed non-None at this point.

Line 75–76 already set attention_mask = torch.ones_like(total_input_ids) when it's None, so the check on line 87 is always true. Remove it to avoid confusion.

src/megatron/bridge/models/qwen_omni/context_parallel_utils.py (3)

89-89: @torch.no_grad should be @torch.no_grad().

Using torch.no_grad without parentheses works incidentally in recent PyTorch but is unconventional. The canonical decorator form is @torch.no_grad().

-@torch.no_grad
+@torch.no_grad()

51-51: Add strict=True to zip() call to catch length mismatches.

Per Ruff B905, zip(pixel_values, image_grid_thws) should specify strict=True since the assert on line 47 already validates equal lengths.

-    for pixel_value, image_grid_thw in zip(pixel_values, image_grid_thws):
+    for pixel_value, image_grid_thw in zip(pixel_values, image_grid_thws, strict=True):

159-190: AllGatherVisionEmbeddings.forward is missing the ctx for parallel_group, preventing correct backward pass.

ctx.save_for_backward(*seqlens_on_parallel_ranks) saves only the seqlens. The parallel_group is not saved — but the backward doesn't need it, so this is fine. However, there's no @staticmethod type annotation for the arguments, and forward/backward lack return type hints.

More importantly, ctx.parallel_rank is saved as a plain attribute — verify that this survives serialization if checkpointing is used with torch.autograd.graph.save_on_cpu.

src/megatron/bridge/models/qwen_omni/qwen3_omni_provider.py (1)

150-151: Remove or explain commented-out code.

Line 151 has # vision_transformer_config = deepcopy(self) with the placeholder comment "placeholder for future use" but no explanation of why it is retained. As per coding guidelines, either add a clear comment explaining the intent and why it's commented out, or remove it.

src/megatron/bridge/models/qwen_omni/modelling_qwen3_omni.py (4)

15-19: Unused imports and import ordering.

mpu and tensor_parallel (line 19) are not used in this file — they are used by the thinker, not this facade. Also, the typing stdlib import should precede the third-party torch import per the project's import-ordering guideline.

Proposed fix
-import torch
-from typing import Optional
+from typing import Optional
 
-from megatron.core.process_groups_config import ProcessGroupCollection
-from megatron.core import InferenceParams, mpu, tensor_parallel
+import torch
+
+from megatron.core import InferenceParams
+from megatron.core.process_groups_config import ProcessGroupCollection

41-42: Use T | None instead of Optional[T].

Per coding guidelines: use T | None for nullable types. This also lets you drop the from typing import Optional import.

-        talker_transformer_config: Optional[Qwen3OmniMoeTalkerConfig] = None,
-        code2wav_transformer_config: Optional[Qwen3OmniMoeCode2WavConfig] = None,
+        talker_transformer_config: Qwen3OmniMoeTalkerConfig | None = None,
+        code2wav_transformer_config: Qwen3OmniMoeCode2WavConfig | None = None,

As per coding guidelines, "Use 'T | None' for nullable types instead of 'Optional[T]'".


1-1: Copyright year is 2025; current year is 2026.

Consider updating to 2026 (or 2025-2026) to reflect the actual creation date of this new file.


33-35: Class docstring is sparse for a public API entity.

Since Qwen3OmniMoeModel is the top-level public model class (exported via __init__.py), consider briefly documenting its role as a facade around the thinker model, and noting that talker/code2wav support is planned.

Comment on lines 41 to 42
talker_transformer_config: Optional[Qwen3OmniMoeTalkerConfig] = None,
code2wav_transformer_config: Optional[Qwen3OmniMoeCode2WavConfig] = None,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Unused constructor arguments talker_transformer_config and code2wav_transformer_config.

These parameters are accepted but neither stored nor forwarded to any submodule. If they are placeholders for upcoming talker/code2wav support, consider either:

  • Adding a comment/TODO explaining the intent, or
  • Removing them until the functionality is implemented, to avoid a misleading public API.

Callers currently must supply (or default) these configs with no effect, which is confusing.

🧰 Tools
🪛 Ruff (0.15.0)

[warning] 41-41: Unused method argument: talker_transformer_config

(ARG002)


[warning] 42-42: Unused method argument: code2wav_transformer_config

(ARG002)

🤖 Prompt for AI Agents
In `@src/megatron/bridge/models/qwen_omni/modelling_qwen3_omni.py` around lines 41
- 42, The constructor accepts talker_transformer_config and
code2wav_transformer_config but never uses or stores them; either remove these
parameters from the __init__ signature or persist/forward them (e.g., assign to
self.talker_transformer_config and self.code2wav_transformer_config or pass them
into the appropriate submodule constructors) so callers' values take effect; if
they are placeholders, add a clear TODO comment in the class constructor noting
intended future use (reference the parameter names talker_transformer_config and
code2wav_transformer_config in the class constructor to locate the change).

Comment on lines 15 to 16
from dataclasses import dataclass, field
from typing import List, Optional
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Use built-in generics and T | None syntax.

Same issue as other files: replace List with list and Optional with | None.

Proposed fix
-from dataclasses import dataclass, field
-from typing import List, Optional
+from dataclasses import dataclass, field
+from __future__ import annotations
-    deepstack_visual_indexes: List[int] = field(default_factory=lambda: [8, 16, 24])
+    deepstack_visual_indexes: list[int] = field(default_factory=lambda: [8, 16, 24])
     ...
-    mrope_section: List[int] = field(default_factory=lambda: [24, 20, 20])
+    mrope_section: list[int] = field(default_factory=lambda: [24, 20, 20])
     ...
-    hf_text_config: Optional[Qwen3VLTextConfig] = None
+    hf_text_config: Qwen3VLTextConfig | None = None

As per coding guidelines: "Use built-in generics (list, dict, tuple) instead of typing equivalents" and "Use 'T | None' for nullable types instead of 'Optional[T]'."

Also applies to: 37-37, 44-44, 53-53

🤖 Prompt for AI Agents
In `@src/megatron/bridge/models/qwen_omni/transformer_config.py` around lines 15 -
16, The file imports and type annotations use typing.List and typing.Optional;
update the import and annotations in transformer_config.py to use built-in
generics and PEP 604 union syntax: replace any "List[T]" with "list[T]" and
"Optional[T]" with "T | None" (remove List and Optional from the typing import),
and update all corresponding fields/usages in the dataclass (e.g., the
TransformerConfig dataclass and the other three annotated attributes referenced
in the review) to use the new forms.

Comment on lines +18 to +23
from megatron.core.transformer.transformer_config import TransformerConfig
from transformers.models.qwen3_vl.configuration_qwen3_vl import Qwen3VLTextConfig


@dataclass
class Qwen3OmniTransformerConfig(TransformerConfig):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Search for direct instantiation of Qwen3OmniTransformerConfig
echo "=== Direct instantiation of Qwen3OmniTransformerConfig ==="
rg -nP 'Qwen3OmniTransformerConfig\(' --type=py

# Search for any instantiation patterns
echo -e "\n=== Usage patterns of Qwen3OmniTransformerConfig ==="
rg -nP 'Qwen3OmniTransformerConfig' --type=py -A 2

# Check the actual bridge TransformerConfig to understand the difference
echo -e "\n=== Bridge TransformerConfig location and structure ==="
find . -name "transformer_config.py" -path "*/megatron/bridge/*" -type f

Repository: NVIDIA-NeMo/Megatron-Bridge

Length of output: 2294


🏁 Script executed:

#!/bin/bash
# Check what other model configs in bridge inherit from
echo "=== Other transformer configs in bridge models ==="
find src/megatron/bridge/models -name "*transformer_config.py" -type f -exec sh -c 'echo "=== {} ===" && head -30 "$1"' _ {} \;

Repository: NVIDIA-NeMo/Megatron-Bridge

Length of output: 3913


🏁 Script executed:

#!/bin/bash
# Examine the Qwen3OmniTransformerConfig usage in provider
echo "=== Qwen3OmniTransformerConfig in qwen3_omni_provider.py ==="
rg -nP 'Qwen3OmniTransformerConfig|language_transformer_config' src/megatron/bridge/models/qwen_omni/qwen3_omni_provider.py -B 2 -A 2

Repository: NVIDIA-NeMo/Megatron-Bridge

Length of output: 762


🏁 Script executed:

#!/bin/bash
# Check if bridge's TransformerConfig is used elsewhere for similar patterns
echo "=== Checking if bridge's TransformerConfig wrapper is used in other configs ==="
rg -nP 'from megatron\.bridge\.models\.transformer_config import TransformerConfig' --type=py

# Verify the bridge's TransformerConfig has the finalize method
echo -e "\n=== Bridge TransformerConfig class definition ==="
rg -nA 10 'class TransformerConfig' src/megatron/bridge/models/transformer_config.py

Repository: NVIDIA-NeMo/Megatron-Bridge

Length of output: 1493


Import TransformerConfig from the bridge's wrapper instead of directly from megatron.core.

Import from src/megatron/bridge/models/transformer_config.py to maintain consistency with other model providers in the bridge (gpt_provider, t5_provider, mamba_provider, distillation_provider). The bridge's TransformerConfig wrapper provides deferred post-initialization semantics that align with the bridge's configuration override system. While Qwen3OmniTransformerConfig is currently only used as a type annotation and never instantiated directly, aligning the import pattern ensures consistency across the codebase and prevents potential issues if direct instantiation is needed in the future.

🤖 Prompt for AI Agents
In `@src/megatron/bridge/models/qwen_omni/transformer_config.py` around lines 18 -
23, Replace the direct import of TransformerConfig from
megatron.core.transformer.transformer_config with the bridge wrapper import used
across providers: import TransformerConfig from
megatron.bridge.models.transformer_config so Qwen3OmniTransformerConfig inherits
the bridge's wrapper class (update the import statement that currently
references TransformerConfig and keep Qwen3OmniTransformerConfig as-is).

# limitations under the License.


from typing import Optional
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Use T | None instead of Optional[T] per coding guidelines.

The Optional import and its usages in the get_rope_index signature should be replaced with X | None union syntax (Python 3.10+).

Proposed fix
-from typing import Optional
-
 import torch

Then update the signature:

-    input_ids: Optional[torch.LongTensor] = None,
-    image_grid_thw: Optional[torch.LongTensor] = None,
-    video_grid_thw: Optional[torch.LongTensor] = None,
-    attention_mask: Optional[torch.Tensor] = None,
-    audio_seqlens: Optional[torch.LongTensor] = None,
-    second_per_grids: Optional[torch.Tensor] = None,
+    input_ids: torch.LongTensor | None = None,
+    image_grid_thw: torch.LongTensor | None = None,
+    video_grid_thw: torch.LongTensor | None = None,
+    attention_mask: torch.Tensor | None = None,
+    audio_seqlens: torch.LongTensor | None = None,
+    second_per_grids: torch.Tensor | None = None,

As per coding guidelines: "Use 'T | None' for nullable types instead of 'Optional[T]'."

Also applies to: 60-66

🤖 Prompt for AI Agents
In `@src/megatron/bridge/models/qwen_omni/utils.py` at line 16, Replace the use of
typing.Optional with Python 3.10 union syntax: remove the Optional import and
update any function/type hints that use Optional[...] to use X | None instead;
specifically change the get_rope_index signature and any other occurrences in
the same file (around the block referenced at lines 60-66) to use the `Type |
None` form (e.g., replace Optional[int] with int | None) and ensure imports
reflect removal of Optional.

Comment on lines +86 to +88
for i, input_ids in enumerate(total_input_ids):
if attention_mask is not None:
input_ids = input_ids[attention_mask[i] == 1]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Variable input_ids is shadowed inside the loop, risking subtle bugs.

Line 86 re-binds input_ids (the loop variable from enumerate(total_input_ids)) and line 88 narrows it further via the attention mask. This shadows the function parameter, and on line 227–228, len(input_ids) and input_ids.device refer to the last iteration's masked slice, not the original tensor. While the delta calculation may be intentional, the shadowing is error-prone and violates the guideline "Avoid shadowing variables declared in an outer scope."

Proposed fix: rename the loop variable
-        for i, input_ids in enumerate(total_input_ids):
-            if attention_mask is not None:
-                input_ids = input_ids[attention_mask[i] == 1]
+        for i, _input_ids in enumerate(total_input_ids):
+            if attention_mask is not None:
+                _input_ids = _input_ids[attention_mask[i] == 1]

Then replace all downstream references to input_ids within the loop body (lines 89–227) with _input_ids, and fix line 228 to use total_input_ids.device.

As per coding guidelines: "Avoid shadowing variables declared in an outer scope."

🤖 Prompt for AI Agents
In `@src/megatron/bridge/models/qwen_omni/utils.py` around lines 86 - 88, The loop
reuses the outer-scope name input_ids (for i, input_ids in
enumerate(total_input_ids)), which gets masked and then later used incorrectly;
rename the loop variable to _input_ids (for i, _input_ids in
enumerate(total_input_ids)), apply the attention mask to a local masked_ids =
_input_ids[_mask == 1] (or similar) and replace all downstream references inside
the loop from input_ids to the new name (e.g., masked_ids or _input_ids) so you
no longer shadow the outer variable; also change the places that rely on the
original tensor/device/length (notably the code that uses len(input_ids) and
input_ids.device around the delta calculation) to use total_input_ids (or
total_input_ids.device / len(total_input_ids)) instead.

Signed-off-by: zhangcheng <chzhang_bj@163.com>
@yaoyu-33
Copy link
Contributor

@zhang-cheng09 : can you check the comments.
we also probably need unit tests + functional tests, we are testing the code coverage upon merging. you can refer to qwen3 vl tests

@zhang-cheng09 zhang-cheng09 force-pushed the qwen3_omni_moe branch 2 times, most recently from 795c990 to d22f2f6 Compare February 18, 2026 13:06
@yaoyu-33
Copy link
Contributor

/ok to test d22f2f6

@yaoyu-33
Copy link
Contributor

yaoyu-33 commented Feb 18, 2026

@zhang-cheng09: happy holidays.

There are 2 things need double check. Please note for typing we use " | None" to represent Optional.

The audio part: does it work? if not let's assert not supported.

The lint is failing - please check contribute.md, you need to call precommit

@zhang-cheng09
Copy link
Author

@yaoyu-33 happy holidays.
First, the audio part is work. It's my fault to merge the changes with a bug. I will fix it.
Second, I will run pre-commit.

Signed-off-by: zhangcheng <chzhang_bj@163.com>
@yaoyu-33
Copy link
Contributor

3rd party contribute dont have permission to trigger now. Lemme trigger for you.

@yaoyu-33
Copy link
Contributor

/ok to test eca748c

@zhang-cheng09
Copy link
Author

/ok to test eca748c

I do not understand what's the problem with CICD NeMO/Lauch_Unit_Tests?

@zhang-cheng09
Copy link
Author

/ok to test eca748c

I do not understand what's the problem with CICD NeMO/Lauch_Unit_Tests?

Should I update the base branch?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants